Urdu and Hindi: Translation and sharing of linguistic resources
نویسندگان
چکیده
Hindi and Urdu share a common phonology, morphology and grammar but are written in different scripts. In addition, the vocabularies have also diverged significantly especially in the written form. In this paper we show that we can get reasonable quality translations (we estimated the Translation Error rate at 18%) between the two languages even in absence of a parallel corpus. Linguistic resources such as treebanks, part of speech tagged data and parallel corpora with English are limited for both these languages. We use the translation system to share linguistic resources between the two languages. We demonstrate improvements on three tasks and show: statistical machine translation from Urdu to English is improved (0.8 in BLEU score) by using a Hindi-English parallel corpus, Hindi part of speech tagging is improved (upto 6% absolute) by using an Urdu part of speech corpus and a Hindi-English word aligner is improved by using a manually word aligned UrduEnglish corpus (upto 9% absolute in FMeasure).
منابع مشابه
A House United: Bridging the Script and Lexical Barrier between Hindi and Urdu
In Computational Linguistics, Hindi and Urdu are not viewed as a monolithic entity and have received separate attention with respect to their text processing. From part-of-speech tagging to machine translation, models are separately trained for both Hindi and Urdu despite the fact that they represent the same language. The reasons mainly are their divergent literary vocabularies and separate or...
متن کاملImproving Machine Translation via Triangulation and Transliteration
In this paper we improve Urdu→Hindi English machine translation through triangulation and transliteration. First we built an Urdu→Hindi SMT system by inducing triangulated and transliterated phrase-tables from Urdu–English and Hindi–English phrase translation models. We then use it to translate the Urdu part of the Urdu-English parallel data into Hindi, thus creating an artificial Hindi-English...
متن کاملDeveloping English-Urdu Machine Translation Via Hindi
The paper presents a strategy for deriving English to Urdu translation using English to Hindi MT system. The English-Hindi lexical database is used to collect all possible Hindi words and phrases. These are further augmented by including their morphological variations and attaching all possible postpositions. This list is used to provide mapping from Hindi to Urdu. There may be change in gender...
متن کاملMovement and Intervention Effects : Evidence from Hindi / Urdu
Title of Document: MOVEMENT AND INTERVENTION EFFECTS: EVIDENCE FROM HINDI/URDU. Shiti Malhotra, Doctor of Philosophy, 2011 Directed By: Professor Norbert Hornstein Professor Howard Lasnik Department of Linguistics The purpose of this dissertation is to explore the nature of intervention effects seen in various constructions like Wh-scope marking, raising and passivization. In particular, this d...
متن کاملUrdu Hindi Machine Transliteration using SMT
Transliteration is a process of transcribing a word of the source language into the target language such that when the native speaker of the target language pronounces it, it sounds as the native pronunciation of the source word. Statistical techniques have brought significant advances and have made real progress in various fields of Natural Language Processing (NLP). In this paper, we have ana...
متن کامل